Goto

Collaborating Authors

 statistical dependency




Simplicial Gaussian Models: Representation and Inference

arXiv.org Machine Learning

Thus, they are widely used in several applications, including computer vision, computational biology, and spatial statistics [2, 3, 4]. In a PGM, random variables are associated with the vertices of a graph, while edges encode statistical dependencies. The meaning of the edges depend on the graph type: Bayesian Networks capture directional dependencies through directed acyclic graphs (DAGs) [5], whereas Markov Random Fields (MRFs) model symmetric conditional dependencies with undirected graphs, thanks to the Markov property [6]. A well-studied family is Gaussian Markov Random Fields (GMRFs), i.e., MRFs that model Gaussian random variables [7]. Indeed, conditional dependencies in the Gaussian distribution are encoded by the precision matrix, thus allowing to learn GMRF from data with efficient algorithms [8]. However, PGMs are inherently limited to graphs. First, PGMs typically associate random variables with individual nodes (sets of cardinality one), while in many settings random quantities naturally relates with larger sets. Examples include data traffic in communication networks or water flows in distribution networks, where measurements are collected on the links of the networks [9, 10, 11]. Second, PGMs are restricted to modeling pairwise dependencies via edges.




Disentanglement in Difference: Directly Learning Semantically Disentangled Representations by Maximizing Inter-Factor Differences

arXiv.org Artificial Intelligence

In this study, Disentanglement in Difference(DiD) is proposed to address the inherent inconsistency between the statistical independence of latent variables and the goal of semantic disentanglement in disentanglement representation learning. Conventional disentanglement methods achieve disentanglement representation by improving statistical independence among latent variables. However, the statistical independence of latent variables does not necessarily imply that they are semantically unrelated, thus, improving statistical independence does not always enhance disentanglement performance. To address the above issue, DiD is proposed to directly learn semantic differences rather than the statistical independence of latent variables. In the DiD, a Difference Encoder is designed to measure the semantic differences; a contrastive loss function is established to facilitate inter-dimensional comparison. Both of them allow the model to directly differentiate and disentangle distinct semantic factors, thereby resolving the inconsistency between statistical independence and semantic disentanglement. Experimental results on the dSprites and 3DShapes datasets demonstrate that the proposed DiD outperforms existing mainstream methods across various disentanglement metrics.


Simultaneous Dimensionality Reduction for Extracting Useful Representations of Large Empirical Multimodal Datasets

arXiv.org Artificial Intelligence

The quest for simplification in physics drives the exploration of concise mathematical representations for complex systems. This Dissertation focuses on the concept of dimensionality reduction as a means to obtain low-dimensional descriptions from high-dimensional data, facilitating comprehension and analysis. We address the challenges posed by real-world data that defy conventional assumptions, such as complex interactions within neural systems or high-dimensional dynamical systems. Leveraging insights from both theoretical physics and machine learning, this work unifies diverse reduction methods under a comprehensive framework, the Deep Variational Multivariate Information Bottleneck. This framework enables the design of tailored reduction algorithms based on specific research questions. We explore and assert the efficacy of simultaneous reduction approaches over their independent reduction counterparts, demonstrating their superiority in capturing covariation between multiple modalities, while requiring less data. We also introduced novel techniques, such as the Deep Variational Symmetric Information Bottleneck, for general nonlinear simultaneous reduction. We show that the same principle of simultaneous reduction is the key to efficient estimation of mutual information. We show that our new method is able to discover the coordinates of high-dimensional observations of dynamical systems. Through analytical investigations and empirical validations, we shed light on the intricacies of dimensionality reduction methods, paving the way for enhanced data analysis across various domains. We underscore the potential of these methodologies to extract meaningful insights from complex datasets, driving advancements in fundamental research and applied sciences. As these methods evolve, they promise to deepen our understanding of complex systems and inform more effective data analysis strategies.


Time delay multi-feature correlation analysis to extract subtle dependencies from EEG signals

arXiv.org Artificial Intelligence

Electroencephalography (EEG) signals are resultants of extremely complex brain activity. Some details of this hidden dynamics might be accessible through e.g. joint distributions $\rho_{\Delta t}$ of signals of pairs of electrodes shifted by various time delays (lag $\Delta t$). A standard approach is monitoring a single evaluation of such joint distributions, like Pearson correlation (or mutual information), which turns out relatively uninteresting - as expected, there is usually a small peak for zero delay and nearly symmetric drop with delay. In contrast, such a complex signal might be composed of multiple types of statistical dependencies - this article proposes approach to automatically decompose and extract them. Specifically, we model such joint distributions as polynomials, estimated separately for all considered lag dependencies, then with PCA dimensionality reduction we find the dominant joint density distortion directions $f_v$. This way we get a few lag dependent features $a_i(\Delta t)$ describing separate dominating statistical dependencies of known contributions: $\rho_{\Delta t}(y,z)\approx \sum_{i=1}^r a_i(\Delta t)\, f_{v_i}(y,z)$. Such features complement Pearson correlation, extracting hidden more complex behavior, e.g. with asymmetry which might be related with direction of information transfer, extrema suggesting characteristic delays, or oscillatory behavior suggesting some periodicity. There is also discussed extension of Granger causality to such multi-feature joint density analysis, suggesting e.g. two separate causality waves. While this early article is initial fundamental research, in future it might help e.g. with understanding of cortex hidden dynamics, diagnosis of pathologies like epilepsy, determination of precise electrode position, or building brain-computer interface.


Measuring Statistical Dependencies via Maximum Norm and Characteristic Functions

arXiv.org Artificial Intelligence

In this paper, we focus on the problem of statistical dependence estimation using characteristic functions. We propose a statistical dependence measure, based on the maximum-norm of the difference between joint and product-marginal characteristic functions. The proposed measure can detect arbitrary statistical dependence between two random vectors of possibly different dimensions, is differentiable, and easily integrable into modern machine learning and deep learning pipelines. We also conduct experiments both with simulated and real data. Our simulations show, that the proposed method can measure statistical dependencies in high-dimensional, non-linear data, and is less affected by the curse of dimensionality, compared to the previous work in this line of research. The experiments with real data demonstrate the potential applicability of our statistical measure for two different empirical inference scenarios, showing statistically significant improvement in the performance characteristics when applied for supervised feature extraction and deep neural network regularization. In addition, we provide a link to the accompanying open-source repository https://bit.ly/3d4ch5I.


Deep attentive variational inference

AIHub

Figure 1: Overview of a local variational layer (left) and an attentive variational layer (right) proposed in this post. Attention blocks in the variational layer are responsible for capturing long-range statistical dependencies in the latent space of the hierarchy. Generative models are a class of machine learning models that are able to generate novel data samples such as fictional celebrity faces, digital artwork, and scenic images. Currently, the most powerful generative models are deep probabilistic models. This class of models uses deep neural networks to express statistical hypotheses about the data generation process, and combine them with latent variable models to augment the set of observed data with latent (unobserved) information in order to better characterize the procedure that generates the data of interest.